Finding Common Motifs with Gaps Using Finite Automata

نویسندگان

  • Pavlos Antoniou
  • Jan Holub
  • Costas S. Iliopoulos
  • Borivoj Melichar
  • Pierre Peterlongo
چکیده

We present an algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S1, S2, . . . , Sr}. In order to find these common motifs we must first identify the factors that exist in each string. Therefore the algorithm begins by constructing a factor automaton for each string Si. To find the common factors of all the strings, the algorithm needs to gather all the factors from the strings together in one data structure and this is achieved by computing an automaton that accepts the union of the above-mentioned automata. Using this automaton we are able to create a new factor alphabet. Based on this factor alphabet a finite automaton is created for each string Si that accepts sequences of all non overlapping factors residing in each string. The intersection of the latter automata produces the finite automaton which accepts all the common subsequences with gaps over the factor alphabet that are present in all the strings of the set S = {S1, S2, . . . , Sr}. These common subsequences are the common motifs of the strings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reduction of Computational Complexity in Finite State Automata Explosion of Networked System Diagnosis (RESEARCH NOTE)

This research puts forward rough finite state automata which have been represented by two variants of BDD called ROBDD and ZBDD. The proposed structures have been used in networked system diagnosis and can overcome cominatorial explosion. In implementation the CUDD - Colorado University Decision Diagrams package is used. A mathematical proof for claimed complexity are provided which shows ZBDD ...

متن کامل

Communication complexity of promise problems and their applications to finite automata

Equality and disjointness are two of the most studied problems in communication complexity. They have been studied for both classical and also quantum communication and for various models and modes of communication. Buhrman et al. [6] proved that the exact quantum communication complexity for a promise version of the equality problem is O(logn) while the classical deterministic communication co...

متن کامل

Inexact Pattern Matching Algorithms via Automata

Pattern matching occurs in various applications, ranging from simple text searching in word processors to identification of common motifs in DNA sequences in computational biology. The problem of exact pattern matching has been well studied and a number of efficient algorithms exist. However these exact pattern matching algorithms are of little help when they are applied to finding patterns in ...

متن کامل

Multidimensional fuzzy finite tree automata

This paper introduces the notion of multidimensional fuzzy finite tree automata (MFFTA) and investigates its closure properties from the area of automata and language theory. MFFTA are a superclass of fuzzy tree automata whose behavior is generalized to adapt to multidimensional fuzzy sets. An MFFTA recognizes a multidimensional fuzzy tree language which is a regular tree language so that for e...

متن کامل

Construction of minimal DFAs from biological motifs

Deterministic finite automata (DFAs) are constructed for various purposes in computational biology. Little attention, however, has been given to the efficient construction of minimal DFAs. In this article, we define simple nondeterministic finite automata (NFAs) and prove that the standard subset construction transforms NFAs of this type into minimal DFAs. Furthermore, we show how simple NFAs c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006